Search CORE

374 research outputs found

CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data

Author: Alves Joao M.
Kozlov Alexey
Posada David
Stamatakis Alexandros
Publication venue: Springer Fachmedien Wiesbaden
Publication date: 26/01/2022
Field of study

We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed. CellPhy is freely available a

KITopen

PubMed Central

The Free Lunch is not over yet—systematic exploration of numerical thresholds in maximum likelihood phylogenetic inference

Author: Haag Julia
Hübner Lukas
Kozlov Alexey M.
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 18/10/2023
Field of study

Maximum likelihood (ML) is a widely used phylogenetic inference method. ML implementations heavily rely on numerical optimization routines that use internal numerical thresholds to determine convergence. We systematically analyze the impact of these threshold settings on the log-likelihood and runtimes for ML tree inferences with RAxML-NG, IQ-TREE, and FastTree on empirical datasets. We provide empirical evidence that we can substantially accelerate tree inferences with RAxML-NG and IQ-TREE by changing the default values of two such numerical thresholds. At the same time, altering these settings does not significantly impact the quality of the inferred trees. We further show that increasing both thresholds accelerates the RAxML-NG bootstrap without influencing the resulting support values. For RAxML-NG, increasing the likelihood thresholds ϵLnL and ϵbrlen to 10 and 103, respectively, results in an average tree inference speedup of 1.9 ± 0.6 on Data collection 1, 1.8 ± 1.1 on Data collection 2, and 1.9 ± 0.8 on Data collection 2 for the RAxML-NG bootstrap compared to the runtime under the current default setting. Increasing the likelihood threshold ϵLnL to 10 in IQ-TREE results in an average tree inference speedup of 1.3 ± 0.4 on Data collection 1 and 1.3 ± 0.9 on Data collection 2

KITopen

Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG

Author: Hespe Demian
Hübner Lukas
Kozlov Alexey M.
Sanders Peter
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 26/05/2021
Field of study

Motivation Phylgenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. Results We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user

KITopen

PubMed Central

Phylogeny-aware identification and correction of taxonomically mislabeled sequences

Author: Gloeckner Frank Oliver
Kozlov Alexey M.
Stamatakis Alexandros
Yilmaz Pelin
Zhang Jiajie
Publication venue: Oxford University Press
Publication date: 26/04/2016
Field of study

Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa

KITopen

PubMed Central

MPG.PuRe

ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes

Author: Kozlov Alexey M.
Morel Benoit
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 12/06/2019
Field of study

KITopen

ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models

Author: Darriba Diego
Flouri Tomas
Kozlov Alexey M.
Morel Benoit
Posada David
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 01/01/2019
Field of study

ModelTest-NG is a reimplementation fromscratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions

Repositorio da Universidade da Coruña

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

KITopen

UCL Discovery

The structure of heat-resisting alloy modified by thermal treatment and alloyed by rhenium and lanthanum

Author: Klopotov A. A.
Kondratyuk Alexey Alekseevich
Kozlov E. V.
Kuznetsov M. E.
Nikonenko Elena Leonidovna
Popova N. A.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2016
Field of study

The paper presents the scanning and transmission electron microscope investigations of the structure, phase composition, and morphology of heat-resisting alloy modified by thermal treatment and additionally alloyed by rhenium and lanthanum. Rhenium alloy is obtained by the directional crystallization technique. The structural investigations are carried out for three alloy states, i.e. 1) original (after the directional crystallization); 2) 1150°С annealing during 1 h and 1100°С annealing during 480 h; 3) 1150°С annealing during 1 h and 1100°С annealing during 1430 h. It is shown that fcc-based [gamma]- and [gamma]'-phases are primary in all states of the alloy. [gamma]'-phase has L12 structure, while [gamma]-phase is a disordered phase. Rhenium and lanthanum are phase-forming elements. Investigations show that high-temperature annealing modifies the structural and phase conditions of the heat-resisting alloy

Electronic archive of Tomsk Polytechnic University

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Author: Barbera Pierre
Bettisworth Ben
Czech Lucas
Hübner Lukas
Kostaki Evangelia-Georgia
Kozlov Alexey M.
Lutteropp Sarah
Mamais Ioannis
Morel Benoit
Paraskevis Dimitrios
Pavlidis Pavlos
Serdari Dora
Stamatakis Alexandros
Publication venue: Oxford University Press
Publication date: 07/06/2021
Field of study

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution

KITopen

RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference

Author: Alexandros Stamatakis
Alexey M Kozlov
Baele
Barbera
Benoit Morel
Biczok
Diego Darriba
Fletcher
Guindon
Jonathan Wren
Kobert
Kobert
Kozlov
Le
Le
Lemoine
Nguyen
Price
Sanderson
Stamatakis
Stamatakis
Tomáš Flouri
Yang
Zhou
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

Motivation: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. // Results: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. // Availability and implementation: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/

Crossref

UCL Discovery

A roadmap for global synthesis of the plant tree of life

Author: Antonelli Alexandre
Baker William J.
Bennett Dominic J.
Botigue Laura R.
Burleigh J. Gordon
Dodsworth Steven
Eiserhardt Wolf L.
Enquist Brian J.
Forest Felix
Kim Jan T.
Kozlov Alexey M.
Leitch Ilia J.
Maitner Brian S.
Mirarab Siavash
Perez-Escobar Oscar A.
Piel William H.
Pokorny Lisa
Rahbek Carsten
Sandel Brody
Smith Stephen A.
Stamatakis Alexandros
Vos Rutger A.
Warnow Tandy
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Providing science and society with an integrated, up-to-date, high quality, open, reproducible and sustainable plant tree of life would be a huge service that is now coming within reach. However, synthesizing the growing body of DNA sequence data in the public domain and disseminating the trees to a diverse audience are often not straightforward due to numerous informatics barriers. While big synthetic plant phylogenies are being built, they remain static and become quickly outdated as new data are published and tree-building methods improve. Moreover, the body of existing phylogenetic evidence is hard to navigate and access for non-experts. We propose that our community of botanists, tree builders, and informaticians should converge on a modular framework for data integration and phylogenetic analysis, allowing easy collaboration, updating, data sourcing and flexible analyses. With support from major institutions, this pipeline should be re-run at regular intervals, storing trees and their metadata long-term. Providing the trees to a diverse global audience through user-friendly front ends and application development interfaces should also be a priority. Interactive interfaces could be used to solicit user feedback and thus improve data quality and to coordinate the generation of new data. We conclude by outlining a number of steps that we suggest the scientific community should take to achieve global phylogenetic synthesis

Shared Research Repository

Copenhagen University Research Information System

University of Bedfordshire Repository

Deep Blue Documents at the University of Michigan